Goto

Collaborating Authors

 importance plot


Multi-modal Machine Learning Analysis of X-ray Absorption Near-Edge Spectra and Pair Distribution Functions: Performance and Interpretability towards Experimental Design

arXiv.org Artificial Intelligence

We used off-the-shelf interpretable ML techniques to combine information from multiple heterogeneous spectra: X-ray absorption near-edge spectra (XANES) and atomic pair distribution functions (PDFs), to extract information about local structure and chemistry of transition metal oxides. This approach enabled us to analyze the relative contributions of the different spectra to different prediction tasks. Specifically, we trained random forest models on XANES, PDF, and both of them combined, to extract charge (oxidation) state, coordination number, and mean nearest-neighbor bond length of transition metal cations in oxides. We find that XANES-only models tend to outperform the PDF-only models for all the tasks, and information from XANES often dominated when the two inputs were combined. This was even true for structural tasks where we might expect PDF to dominate. However, the performance gap closes when we used species-specific differential PDFs (dPDFs) as the inputs instead of total PDFs. Our results highlight that XANES contains rich structural information and may be further developed as a structural probe. Our interpretable, multimodal approach is quick and easy to implement when suitable structural and spectroscopic databases are available. This approach provides valuable insights into the relative strengths of different modalities for a practical scientific goal, guiding researchers in their experiment design tasks such as deciding when it is useful to combine complementary techniques in a scientific investigation.


Analyzing Consumer Reviews for Understanding Drivers of Hotels Ratings: An Indian Perspective

arXiv.org Artificial Intelligence

In the internet era, almost every business entity is trying to have its digital footprint in digital media and other social media platforms. For these entities, word of mouse is also very important. Particularly, this is quite crucial for the hospitality sector dealing with hotels, restaurants etc. Consumers do read other consumers reviews before making final decisions. This is where it becomes very important to understand which aspects are affecting most in the minds of the consumers while giving their ratings. The current study focuses on the consumer reviews of Indian hotels to extract aspects important for final ratings. The study involves gathering data using web scraping methods, analyzing the texts using Latent Dirichlet Allocation for topic extraction and sentiment analysis for aspect-specific sentiment mapping. Finally, it incorporates Random Forest to understand the importance of the aspects in predicting the final rating of a user.


Advanced Pipelines with scikit-learn

#artificialintelligence

Figure 1 shows what we would like to have at the end of this article. In the following, we will implement each of these steps. In step 5, we apply hyperparameter optimization and create a feature importance plot. EDA, feature building, maximizing the model's performance, analyzing and interpreting the outcome are not in the scope of this article. The goal is to show you how to work with a pipeline that integrates modules from different packages.


Neural Networks for Latent Budget Analysis of Compositional Data

arXiv.org Machine Learning

Compositional data are non-negative data collected in a rectangular matrix with a constant row sum. Due to the non-negativity the focus is on conditional proportions that add up to 1 for each row. A row of conditional proportions is called an observed budget. Latent budget analysis (LBA) assumes a mixture of latent budgets that explains the observed budgets. LBA is usually fitted to a contingency table, where the rows are levels of one or more explanatory variables and the columns the levels of a response variable. In prospective studies, there is only knowledge about the explanatory variables of individuals and interest goes out to predicting the response variable. Thus, a form of LBA is needed that has the functionality of prediction. Previous studies proposed a constrained neural network (NN) extension of LBA that was hampered by an unsatisfying prediction ability. Here we propose LBA-NN, a feed forward NN model that yields a similar interpretation to LBA but equips LBA with a better ability of prediction. A stable and plausible interpretation of LBA-NN is obtained through the use of importance plots and table, that show the relative importance of all explanatory variables on the response variable. An LBA-NN-K- means approach that applies K-means clustering on the importance table is used to produce K clusters that are comparable to K latent budgets in LBA. Here we provide different experiments where LBA-NN is implemented and compared with LBA. In our analysis, LBA-NN outperforms LBA in prediction in terms of accuracy, specificity, recall and mean square error. We provide open-source software at GitHub.


Using Machine Learning to Predict and Explain Employee Attrition

#artificialintelligence

Employee turnover (attrition) is a major cost to an organization, and predicting turnover is at the forefront of needs of Human Resources (HR) in many organizations. Until now the mainstream approach has been to use logistic regression or survival curves to model employee attrition. However, with advancements in machine learning (ML), we can now get both better predictive performance and better explanations of what critical features are linked to employee attrition. In this post, we'll explain how we used the automated machine learning function from H2O to develop a predictive model that is in the same ballpark as commercial products in terms of ML accuracy we'll also explain how we applied the new LIME package that enables breakdown of complex, black-box machine learning models into variable importance plots. Some costs are tangible such as training expenses and the time it takes from when an employee starts to when they become a productive member.


Finding Common Characteristics Among NBA Playoff and Championship Teams: A Machine Learning Approach

arXiv.org Machine Learning

In this paper, we employ machine learning techniques to analyze seventeen seasons (1999-2000 to 2015-2016) of NBA regular season data from every team to determine the common characteristics among NBA playoff teams. Each team was characterized by 26 predictor variables and one binary response variable taking on a value of "TRUE" if a team had made the playoffs, and value of "FALSE" if a team had missed the playoffs. After fitting an initial classification tree to this problem, this tree was then pruned which decreased the test error rate. Further to this, a random forest of classification trees was grown which provided a very accurate model from which a variable importance plot was generated to determine which predictor variables had the greatest influence on the response variable. The result of this work was the conclusion that the most important factors in characterizing a team's playoff eligibility are a team's opponent number of assists per game, a team's opponent number of made two point shots per game, and a team's number of steals per game. This seems to suggest that defensive factors as opposed to offensive factors are the most important characteristics shared among NBA playoff teams. We then use neural networks to classify championship teams based on regular season data. From this, we show that the most important factor in a team not winning a championship is that team's opponent number of made three-point shots per game. This once again implies that defensive characteristics are of great importance in not only determining a team's playoff eligibility, but certainly, one can conclude that a lack of perimeter defense negatively impacts a team's championship chances in a given season. Further, it is shown that made two-point shots and defensive rebounding are by far the most important factor in a team's chances at winning a championship in a given season.